lets_plot.geom_boxplot

lets_plot.geom_boxplot(mapping=None, *, data=None, stat=None, position=None, show_legend=None, sampling=None, tooltips=None, fatten=None, outlier_color=None, outlier_fill=None, outlier_shape=None, outlier_size=None, varwidth=None, **other_args)

Display the distribution of data based on a five number summary (“minimum”, first quartile (Q1), median, third quartile (Q3), and “maximum”), and “outlying” points individually.

Parameters
  • mapping (FeatureSpec) – Set of aesthetic mappings created by aes() function. Aesthetic mappings describe the way that variables in the data are mapped to plot “aesthetics”.

  • data (dict or DataFrame) – The data to be displayed in this layer. If None, the default, the data is inherited from the plot data as specified in the call to ggplot.

  • stat (str, default=’boxplot’) – The statistical transformation to use on the data for this layer, as a string.

  • position (str or FeatureSpec) – Position adjustment, either as a string (‘identity’, ‘stack’, ‘dodge’, …), or the result of a call to a position adjustment function.

  • show_legend (bool, default=True) – False - do not show legend for this layer.

  • sampling (FeatureSpec) – Result of the call to the sampling_xxx() function. Value None (or ‘none’) will disable sampling for this layer.

  • tooltips (layer_tooltips) – Result of the call to the layer_tooltips() function. Specifies appearance, style and content.

  • fatten (float, default=1.0) – A multiplicative factor applied to size of the middle bar.

  • outlier_color (str) – Default color aesthetic for outliers.

  • outlier_fill (str) – Default fill aesthetic for outliers.

  • outlier_shape (int) – Default shape aesthetic for outliers.

  • outlier_size (float) – Default size aesthetic for outliers.

  • varwidth (bool, default=False) – If False make a standard box plot. If True, boxes are drawn with widths proportional to the square-roots of the number of observations in the groups.

  • other_args – Other arguments passed on to the layer. These are often aesthetics settings used to set an aesthetic to a fixed value, like color=’red’, fill=’blue’, size=3 or shape=21. They may also be parameters to the paired geom/stat.

Returns

Geom object specification.

Return type

LayerSpec

Note

Computed variables:

  • ..lower.. : lower hinge, 25% quantile.

  • ..middle.. : median, 50% quantile.

  • ..upper.. : upper hinge, 75% quantile.

  • ..ymin.. : lower whisker = smallest observation greater than or equal to lower hinge - 1.5 * IQR.

  • ..ymax.. : upper whisker = largest observation less than or equal to upper hinge + 1.5 * IQR.

geom_boxplot() understands the following aesthetics mappings:

  • lower : lower hinge.

  • middle : median.

  • upper : upper hinge.

  • ymin : lower whisker.

  • ymax : upper whisker.

  • alpha : transparency level of a layer. Understands numbers between 0 and 1.

  • color (colour) : color of a geometry lines. Can be continuous or discrete. For continuous value this will be a color gradient between two colors.

  • fill : color of geometry filling.

  • size : lines width.

  • linetype : type of the line of border. Codes and names: 0 = ‘blank’, 1 = ‘solid’, 2 = ‘dashed’, 3 = ‘dotted’, 4 = ‘dotdash’, 5 = ‘longdash’, 6 = ‘twodash’.

  • width : width of boxplot [0..1].

Examples

1
2
3
4
5
6
7
8
9
import numpy as np
from lets_plot import *
LetsPlot.setup_html()
n = 100
np.random.seed(42)
x = np.random.choice(['a', 'b', 'c'], size=n)
y = np.random.normal(size=n)
ggplot({'x': x, 'y': y}, aes(x='x', y='y')) + \
    geom_boxplot()

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import numpy as np
from lets_plot import *
LetsPlot.setup_html()
n = 100
np.random.seed(42)
x = np.random.choice(['a', 'b', 'b', 'c'], size=n)
y = np.random.normal(size=n)
ggplot({'x': x, 'y': y}, aes(x='x', y='y')) + \
    geom_boxplot(fatten=5, varwidth=True, \
                 outlier_shape=8, outlier_size=5)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import numpy as np
import pandas as pd
from lets_plot import *
LetsPlot.setup_html()
n = 100
np.random.seed(42)
x = np.random.choice(['a', 'b', 'c'], size=n)
y = np.random.normal(size=n)
df = pd.DataFrame({'x': x, 'y': y})
agg_df = df.groupby('x').agg({'y': [
    'min', lambda s: np.quantile(s, 1/3),
    'median', lambda s: np.quantile(s, 2/3), 'max'
]}).reset_index()
agg_df.columns = ['x', 'y0', 'y33', 'y50', 'y66', 'y100']
ggplot(agg_df, aes(x='x')) + \
    geom_boxplot(aes(ymin='y0', lower='y33', middle='y50', \
                     upper='y66', ymax='y100'), stat='identity')

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import numpy as np
import pandas as pd
from lets_plot import *
LetsPlot.setup_html()
n, m = 100, 5
np.random.seed(42)
df = pd.DataFrame({'x%s' % i: np.random.normal(size=n) \
                   for i in range(1, m + 1)})
ggplot(df.melt()) + \
    geom_boxplot(aes(x='variable', y='value', color='variable', \
                     fill='variable', outlier_color='variable'), \
                 outlier_shape=21, outlier_size=4, size=2, \
                 alpha=.5, width=.5, show_legend=False)